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Abstract 

We present a method to construct a network null-model based on the maximum entropy prin- 
ciple with the restrictions that the rich-club and the degree sequence are conserved. We show 
that the probability that two nodes share a link can be described with a simple mass probabil- 
ity function, which in turn, allow us to approximate the maximum entropy solution for large 
networks. As an example, we evaluate the null-model of three real networks and show that the 
average degree-degree correlation is well approximated by the null. 

1 Introduction 

Information measures based on Shannon's entropy are used to characterise the complexity of 
networks [U El E]. Shannon's entropy can also be used to obtain the network ensemble that 
best describes our state of knowledge of the network structure. In this case the maximal 
entropy approach (MAXENT) is used to describe our state of knowledge in a way that is 
"maximally noncommittal" by certain criterion [3]. The MAXENT solutions represent the 
best predictions we are able to make based on the given information. Maximising the entropy 
defines a network ensemble that can be considered as a null-model of the network under certain 
structural constraints. The null-model is used to make inferences in networks problems. 

Recently, using entropic measure, Biancconi |2j 15] has formulated how to evaluate different net- 
work ensembles that conserve different structural constraints like the degree sequence, the aver- 
age degree-degree distribution and the community structure. More recently Johnson et. al [6] 
using the MAXENT approach, obtained that for a scale-free network defined only by their de- 
gree sequence the must likely structure of the networks, the null-model, is to be disassortative. 
Their result was obtained using the ansatz that the average degree-degree correlation of the 
network can be described with a power law. 

Here, using MAXENT, we construct an ensemble of networks that is defined by the rich-club 
coefficient and the degree sequence. There are several reasons why we impose these network 
restrictions. In scale-free networks [7] the connectivity of the rich-club plays an important role 
in the functionality of the network [8J, for example in the transmission of rumours in social 
networks [9], the efficient delivery of information in the Internet [lU\ and the organisation of 
the human brain connectivity [llj. Also, the approach to describe a network using the average 
degree-degree correlation has been criticised as it could be ambiguous when classifying the 
assortativity of a network [HJ . The rich-club coefficient does not suffer from these disadvantages 
and it is also related to the degree-degree correlation [T2]. Finally, recently we introduced a 
method to build surrogate networks based on the conservation of the degree sequence and 
rich-club coefficient [13]. We would like to know if the surrogates are biased. 



1 



Section 2 gives a brief background of how to construct surrogates networks that conserve the 
rank based rich-club coefficient. Section 3 evaluates the MAXENT solution for network en- 
sembles generated by conserving the rich-club coefficient and degree sequence. We provide a 
formula to evaluate the mass probability function describing the node node connectivity. In 
section 4 we show some of examples, based on real networks, on how to evaluate the MAXENT 
solution and relate some properties of this solution with the structure of the network. We also 
show how to approximate the MAXENT solution for large networks. Section 5 contains our 
conclusions. 



2 Surrogates that conserve the rich— club coefficient 

If the nodes are ranked in decreasing order of their degree, first node has the highest degree, 
second node the second highest degree and so on, we can characterise the network connectivity 
using the node's degree k r and the number of links AE(r) that node r shares with nodes of 
higher degree. In other words AE(r) is the number of links that node r shares with the nodes 
r G [1, r — 1]. The total number of links is L = Ylf=i AE(i) and the rich-club coefficient [H] is 

v ' i=i 

which is the density of links between the top r ranked nodes. It is possible to generate a 
surrogate network that conserves AE(r) for all r, which is equivalent to conserve the rich-club 
$(r). Let us assume that P(r',r) is the probability that node r connects to node r' and that 
P(r, r) = as self-loops are not allowed. Given the AE(r) links, we constrain the connectivity 
of a network by imposing the condition that the average number of links, AE(r) satisfies 

r- 1 

AE(r) = p (h r) = AE(r). (2) 

i=i 

Under this condition the average degree k r of node r is 

N N 
r'=l j=r+l 

with standard deviation o\ = Yl^'=i P{ r ' •> r )(l — P{ r ' \ r ))- The average degree of the nearest- 
neighbours [15J of a node with degree k is 




where the Kronecker delta is introduce to consider only nodes with degree equal to k, Nk is the 
number of fc-degree nodes and the term 1/k is a normalisation factor. 

Previously [13] we proposed that the probability can be factorized as P(r'r) = T(r' ,r)AE(r), 
where r' < r and T(r', r) is a linking term. The simplest case is when the AE(r) links are 
evenly distributed between node r and the r' < r nodes, then the probability that node r 
connects to r' is 

P(r',r) = T(r'r)AE(r) = -AE(r), r' < r, (5) 

where T(r',r) — l/(r — 1). We called this the egalitarian case. 
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For the case that node r prefers to connect to nodes with lower rank (i.e. higher degree), we 
proposed the preferential linking term T(r f , r) = r'~ a /S(r), where a > is a constant and S(r) 
is a normalisation factor. The probability that there is a link between r and r' is 



yl—a I <y.l—a 



p ( r » = -^ AE ( r ) = I v^r-i . : I A ^( r )> ( 6 ) 



where S(r) = X][=i^ _a to ensure that Y7iZi P{h r ) = 



3 Maximal Entropy 

First we change the notation by using the label of the links to describe the probability that 
node i links with node j, that is p r = PgUj) where g(i,j) maps the labels of node % and j with 
the label r of the link that joins them. We assume that the network is undirected, has no 
self-loops, but allow that two nodes can share more than one link. Following the notation used 
in [TB] the entropy 

N(N-l)/2 

H(pi, p N{N -i)/ 2 ) = - Pm logPm ( 7 ) 

m=l 

is maximised under the constraints that the probabilities p r are normalised, i.e. X^jf - Pr 
I . and the rich-club connectivity and the degree sequence are conserved. The normalisation 
condition can be satisfied if we notice that the total number of links in the network is L = 
^2 r ^2 r , P(r' , r), so we consider the probability p g (ij) = P(i,j)/L. Using the transformation 
p m = exp(— q m ) the constrains become ^^jf -1 " 2 e~ 9r = 1 and 

N(N-l)/2 

J2 fr(i)e- q * = m r , r = l,...,M (8) 

t=i 

where m r are M constraints that are related to qi via the map f r (i). If the Lagrangian multi- 
pliers are A , . . . Am then the partition function is 

N(N-l)/2 

Z(A 1 ,A 2 ,...A M ) = e- 1 ^ Xim (9) 

i=l 

where A = — logZ(Ai, . . . , Xm) and m r = <9(logZ(Ai, . . . , Aj\f))/9A r . These partial derivatives 
are used to construct a set of M non-linear equations 

N(N-l)/2 

( m r ~ fr(i))e^ Xjfj{i) = 0, r = 1, 2, . . . , M. (10) 

i=i 

If we substitute tj = e Aj then Eq. ( 10 ) becomes 



AT(JV-l)/2 M 

E K-/r(0)n*? W = 0, r = l,2,...,M. (11) 



i=l j=l 

and 




Ao = l-log T 17 *T (12) 
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From the above two equations the MAXENT solutions are 

M 

ft = l-Ao-X)Ar/r(i). r = l,2,...,iV(JV-l)/2 (13) 

i=l 

and p r = -log(g r ). 

3.1 Conserving the rich— club 

To conserve only the rich-club connectivity, the constraints in eq. ^ are 

i=l 

where the condition AE(r)/L is to satisfy that the probabilities p r are normalised. These 
constraints can be rewritten as 

N 

^ h(i, r + l)p gftr+1 ) = rn r , r = 1,2, . . . ,iV - 1 (15) 

i=l 



where m r = AE(r)/L and 



/'«•./) = <* ' ;,, 7 . as) 

otherwise. 



By using the properties of the h(i,j) function, p g (i >r+ i) = exp(— ? s (i, r +i)) an d £j — eAj > Eq. (11) 
becomes the set of linear equations 

A/ 

r(m r -l)t r + ^2 i(m r ti) = 0, r = l,...,M. (17) 



Notice that if m r = then we have the trivial solution t r = 0. From Eq. (13) the probabilities 
are p g {i,j) = e 1_A °~ Aj - 1 , and if we take into consideration that the rich-club is conserved 

!><«,•■> = Ec 1 -^ 1 = (r - = (18) 



i=l i=l 



which implies e 1 A ° Ar - 1 = AE(r)/((r — 1)L) then 

AE(r) . , 

Pir(i,r) = ( r _i)L ' 1 < r - (19) 

Showing that the egalitarian case described in Eq. ^ is a solution of the maximal entropy 
ensemble. 



3.2 Conserving the rich— club and degree sequence 

To conserve both the rich-club connectivity and degree sequence we use the constraint given 



by Eq. (14) plus the restriction that the degree sequence is conserved, which is given by 

N 



k r - AE(r) , v 

Z>g(r,0= r , r = l,...,iV (20) 



i=r+l 
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and can be written as 



N 



^2 K r i i)Qg(r,i) = mN-i+r, r = 1, 2, . . . , iV - 1 



(21) 



where m N _i +r = (k r — AE(r))/L. We can interpret Eqs. (14) and (21) as the condition that 
given node r, its links are explicitly divided into the ones that connect with nodes of higher 
degree (AE(r)) and the ones that connect with nodes of lower degree (k r — AE(r)). Again by 
manipulating the constraints we have that Eq. (11) becomes the set of non-linear equations 

N-l /N-l 



j=i \ i=j 



(22) 



and Eq. (21) 



N-l /N-l 

j=i \ i=j 



6(N-l+j,r))ti)t N -. 1+j = a, N-K 



(23) 



where 5(i,j) is the Kronecker delta. 



From qi = 1 — A — ^2^=1 ^ r /»"(*)' ^ ne MAXENT probabilities are p g Uj) = e 
The conservation of the rich-club implies that 



1-Aq-A 



j — 1 g — A JV + i 



3-1 



,1-Ao-Aj 



3-1 



A^(j) 



(24) 



t=i 



i=l 



If we define the function w(il 



then the probability mass function can be written as 



w[i) 



J2 J m=l W ( m ) 



AE(j) s AE(j) 



(25) 



where e i-^o-^-i = AE(j)/L and T(i,j) = w(i)/J2 j m =iw(m). 

Notice that if the function w(i) for i = 1, . . . N is known then we can describe the probability 
of connection between all the nodes in the network. Knowing the functional form of w(i) is 
equivalent, up to a constant factor, to knowing T(i, N) for i = 1, . . . , N. 



4 Examples 

To study the properties of the linking term T(i, N) and how it relates to the network structure 
we evaluated numerically the M AXENT solution for the following real networks. 



4.1 Complex Network co-authorship 

The network data is the giant component of the scientists working in the field of Complex 
Networks as collected by Newman [T7j. In here, we consider the network as unweighted and 
undirected. This network has some characteristics that are interesting in our context. From 
the average degree-degree correlation the network cannot be classified as assortative or disas- 
sortative (Fig. Ufa)). The nodes of degree one do not share any links with nodes of degree two. 
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(c) (d) 

Figure 1: Co-authorship network, (a) Average degree-degree correlation (close-square data, 
open-circle MAXENT solution), (b) degree sequence, (c) number of links AE(r) share between 
node r and nodes r' < r and (d) the linking term T(r, N) obtained from the MAXENT solution. 
The network has 379 nodes and 916 links. 



There is a tendency of nodes of similar low degree to connect with each other. This is expected 
as by construction all the co-authors of a paper are represented by a clique, that is if there 
is an article with four authors, the degree of these nodes is as least four and all these nodes 
are inter-connected. We use these properties of the network as a comparison point with the 
MAXENT solution. Figure [jjb) shows the degree sequence which, for small r, decays like a 
power law, and (c) the number of links AE(r) between node r and nodes r' G [l,r — 1]. The 
nodes are ranked in decreasing order of their degree. 

Figure []Jd) shows T(r, N) obtained from the numerical solution of MAXENT. As previously 
mentioned in Eq. (25), this function describes the mass density function. We noticed that for 
small values of the rank, T(r, N) decays in a similar fashion as the degree sequence. In this case 
proportional to r~ 0A shown as a solid line in Fig[T^d). This reflects the property that there is 
a preferential attachment between the top ranked nodes. 

As the value of r increases T(r, N) decreases and around r > 40 increases again and has a 
"stepped" shape. The increment as r tends to N, shows that there is a preferential attachment 
between the low ranking nodes. The MAXENT solution captures the property that there is a 
tendency of low degree nodes to connect with low degree nodes. This reflects the property that 
there are papers with small number of authors that form cliques. 

The MAXENT solution also reproduces the behaviour of the average degree-degree correlation 
k nn (k) which is shown in Fig[T](a) (open circles). For small k the discrepancy between the data 
and the MAXENT solution is because the MAXENT solution predicts that nodes with degree 
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Figure 2: Co-authorship network, (a) Smoothing of the linking term T(r, N), (b) The smooth 
linking term obtained using two different ranking schemes showed as open-circles and solid- 
triangles. 



one and degree two should share some links. These kind of links are not present in the network 
data. 

To understand the stepped behaviour of T(r, N) we have marked in Fig. [jjd) with dashed lines 
the value of r where the degree of the nodes changes. The first observation is that if k r is the 
degree of node r and AE(r) is the number of links that connect to nodes of higher rank, then 
k r — AE(r) is the number of links that connect to nodes of lower rank. If k r — AE(r) = means 
that a node r' > r does not share a link with r. In other words, T(r, N) = if k r — AE(r) = 
which are the zeros shown in Fig. [I](d). 

The second observation can be explained using the case k r = 3 marked in Fig. [ltd). Consider 
the case that k r = 3 where k r — AE(r) = i and i > 0. if i = 1 means that from the possible 
three links that node r has, only one link connects to nodes with rank r' > r, we denote the 
probability of this happening as p. Now if i = 2 then there are two links that can connect 
node r' and r. If the MAXENT solution is non-bias then the probability that node r connects 
with r' is p + p, that is the probability that one of the free links connects the two nodes plus 
the probability that the other free link connects the two nodes. In Fig. [l](d), when k r = 3, the 
case k r — AE(r) = 1 correspond to the lower step and k r — AE(r) = 2 to the upper step. The 
implication of this observation is that we can describe T(r, N) via a non-stepped function. If 
we introduce the function s(r) = T(r, N)/(k r — AE(r)) where k r — AE(r) ^ this function 
lies on a smooth curve, see Fig. 

The third observation obtained when fitting s(r) to the data using least square is that s(r) 
can be approximated well via A/(N — r) + B, where A and B are parameters. Partially we 
can justify this observation by noticing that if there are N — r possible nodes where a link can 
be attached to, then the un-bias probability that we attach this link to any of these nodes is 
proportional to 1/(N — r). 

Our final observation is that the MAXENT solution depends on the way that the nodes are 
ranked. There is an ambiguity when labelling the nodes via a degree-dependent rank. For high 
degree nodes this is not a problem, as the degree tend to be unique so the rank labels these 
nodes unambiguously. For lower degree nodes, there are many nodes with the same degree. In 
this case the labelling of the nodes is not unique. However when we evaluated the MAXENT 
solution using different ranking schemes for the nodes with equal degree, we noticed that the 
behaviour of s(r) is independent of the ranking scheme, see Fig. |2t 



Putting these observations together with Eq. (25) implies that the probability that node i is 
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Figure 3: Airports Network, (a) Average degree-degree correlation of the airports network 
(solid square) and the null model (open circle) from the MAXENT solution, (b) The degree 
sequence which decays exponentially fast for small values of r. (c) Linking term T(r, N) which 
for small r decays similarly as the degree sequence, (d) The links between the first 37 nodes 
and node 38 are marked with a solid triangle. Average number of links obtained from the 
null-model (open circle) and its standard deviation (dotted lines). 



connected to node j is of the form 

Pg(i,j) = 

where s(i) ~ Aj (N — i) + B. This is the main result of this paper. 



s(i)(kj-AE(i)) f AE(j) \ . . 
E^i s(n)(k„-AE(n)) \ L , J J ( 2 g) 

otherwise 



4.2 Airports 

In the US air transportation network the nodes are airports and an edge represents a direct 
flight between the two airports [18] . Fig. |3](a) shows that the average degree-degree correlation 
of the network (solid squares) is well approximated with the MAXENT solution (open circles). 
Again as in the case seen in the co-authorship network the MAXENT predicts more links 
between nodes of low degree. This is also clear in Fig. |3](c), which shows that high values 
of r (low degree) the probability of connection increases. Also notice that as the case of the 
co-authors network, the T(r, N) behaves similarly as the degree distribution for small r. 

We are interested in this network because the top 18 airports are fully connected. This net- 
work is an example where other randomisation techniques could introduce correlations when 
generating a null-model network if multiple links between nodes are not allowed [19J. As we 
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have not put any restrictions in the MAXENT formulation about the number of links that 
two nodes can share, we use the airport network to show that the MAXENT predicts that on 
average high degree nodes have more than one link. An example of this is shown in Fig. |3](d) 
where the number of links between the node with rank 38 and the nodes with ranks 1 to 37 are 
marked with a solid triangle and the average number of links using Eq. @ and the MAXENT 
solution (open circles) and the standard deviation (dashed lines). As expected the null-model 
predicts that on average two nodes could share more than one link. 

The results from the co-authorship and airports networks show that the method to construct 
surrogates that conserve the rich-club using a preferential attachment [13] (see Eq. (J6|) is 
bias and only works when the degree sequence decays as a power law. The bias is because 
the surrogates introduce correlations between nodes of low degree. Hence these surrogates are 
useful when studying correlations between high degree nodes of power law networks. 



4.3 Larger networks 

Finding the MAXENT solution for medium to large network numerically can be very challeng- 



ing. However we can use Eq. (26) to approximate p g (ij) with reasonable accuracy using the 
following procedure 

• measure k r and AE(r) from the network under consideration, 

• propose a function to approximate T(r, N), for example using s(r) ~ Aj (N — r) + B 

• find the values of A and B by minimising fj = 1/N ^^((^r - k r )/k r ) 2 , where k r is obtained 
using Eq. 

We tested this approach with the word association network [20J which consist of 10,572 nodes 
and 72,175 links. The fit gave the values of A and B where the average relative error fj = 
1 x 1CT 4 . The largest discrepancy between the degree of the network data and the null-model 
was of two links. Fig. [4]^a) shows the average degree-degree correlation of the original data 
(solid squares) is well approximated by the null-model (open circles). Again we noticed that for 
low degrees there is a discrepancy between the original network and the null-model suggesting 
that the null has more links between nodes of low degree with nodes of high degree. To verify 
if this is the case we evaluated the degree-degree frequency for the links that have at one end 
a node with degree one, shown in Fig. |4](b). The data shows that nodes of degree one tend to 
connect with nodes of degree 19. This is also the case for the null-model, where the mode of the 
distribution is also 19. However, the null also shows a small tendency for nodes of degree one 
to connect with nodes of higher degree, this tendency is not present on the original network. 



5 Concluding Remarks 

The main result on the analysis of the MAXENT solution when the rich-club coefficient is 



conserved is the formula (26), which shows that the linking probabilities between nodes can be 
described by a specific mass probability function. One of the main advantages of knowing the 
shape of the mass probability function is that it can be used to approximate the MAXENT 
solution without resorting to the Lagrangian multipliers method, which generates a large set 
of non-linear equations which are difficult to solve numerically. 

For the networks studied here, the MAXENT ensemble captures the preferential connectivity 
with nodes of high degree, and as shown in the case of the co-authorship networks, also a 
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Figure 4: Word Association Network, (a) Comparison of the average degree-degree correlation 
obtained from the data (filled squares) and the one obtained from the approximation of the 
MAXENT solution (open circles), (b) Degree-degree frequency of the links with a node of 
degree one at one end where the network data is shown with filled squares and the null-model 
with open circles. 



preferential linking between nodes of low degree. As the method does not put restrictions on 
the number of links that two nodes can share, it can be used to analyse networks that have a 
densely connected rich-club. The method gives a good approximation to the average degree- 
degree distribution but it is more general, as it can describe in more detail the connectivity 
between nodes with different degree, as it was shown in the case of the word association network. 
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